Beyond Wideband Telephony - Bandwidth Extension for Super-Wideband Speech

نویسندگان

  • Bernd Geiser
  • Peter Vary
چکیده

Driven by the market success of high-quality Voice over IP technology, the introduction of wideband telephony with an acoustic bandwidth of at least 7 kHz is meanwhile also foreseen for “traditional” digital voice communication services such as ISDN, DECT, or UMTS. While wideband speech addresses the basic requirement of intelligibility (even for meaningless syllables), the perceived “naturalness” and the experienced “quality” of speech can be further enhanced by providing an even larger acoustic bandwidth. Thus, the next logical step towards true “Hi-Fi Telephony” could be the rendering of “super-wideband” (SWB) speech signals with an acoustic bandwidth of at least 14 kHz. In this contribution, we review previous and current standardization activities with this focus. Moreover, a method for artificial bandwidth extension (BWE) of wideband speech signals towards “super-wideband” is presented and evaluated. It is shown that improved naturalness and speech quality can be attained by a purely receiver-based modification of wideband terminals. Wideband vs. Super-Wideband Speech Typically, wideband (WB) speech is defined by its acoustic frequency range of 0.05 − 7 kHz, whereas superwideband speech provides a roughly doubled bandwidth of, e.g., 0.05−14 kHz. The lower cutoff frequency of 50Hz is usually considered sufficient for a natural reproduction of speech signals. An analysis reveals that on average only about 1.5% of the energy of super-wideband speech signals is located in the 7− 14 kHz extension band (EB). This average is only exceeded in less than 25% of all active frames, which indicates that there must be strong outliers in the EB to SWB energy ratio σ EB/σ 2 SWB. Such outliers are actually found in fricative and plosive speech sounds as illustrated in Fig. 1. For particularly strong outliers, the EB energy is even larger than the WB energy. This is the case for about 6% of all active frames. Here, the largest benefits over WB signals can be expected. In addition to simple energy considerations, there is also evidence that temporal signal characteristics gain perceptual importance with an increasing frequency, cf. [1, 2]. For the EB range of 7−14 kHz, detailed temporal characteristics may be even more important than the exact reproduction of the spectral envelope information. Results are based on approx. 1.6 · 10 active 20ms speech frames, sampled at 32 kHz and low-pass filtered with fc = 14 kHz. 0 2 4 6 8 10 12 14 16

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Narrowband Telephony to Wideband Telephony

The restricted audio quality of today’s telephone networks is mainly due to the narrowband (NB) limitation to the frequency range from about 300 Hz to 3.4 kHz. Meanwhile, codecs for wideband (WB) telephony (50 Hz to 7 kHz) exist with significantly improved speech intelligibility and naturalness. However, the broad introduction of wideband speech coding will require strong efforts of both networ...

متن کامل

WTIMIT: The TIMIT Speech Corpus Transmitted Over The 3G AMR Wideband Mobile Network

In anticipation of upcoming mobile telephony services with higher speech quality, a wideband (50 Hz to 7 kHz) mobile telephony derivative of TIMIT has been recorded called WTIMIT. It opens up various scientific investigations; e.g., on speech quality and intelligibility, as well as on wideband upgrades of network-side interactive voice response (IVR) systems with retrained or bandwidth-extended...

متن کامل

Artificial Bandwidth Extension of Wideband Speech by Pitch-Scaling of Higher Frequencies

In this paper, a simple DFT-domain pitch-scaling technique is used to extend the audio bandwidth of wideband speech (50Hz – 7 kHz) to the super-wideband range (50Hz – 12 kHz). Therefore, the higher frequencies of the wideband signal (6 – 7 kHz) are pitch-scaled with a scaling factor of four and the resulting, scaled signal is inserted into the 8 – 12 kHz band. A subjective listening test has be...

متن کامل

Audio bandwidth extension using ensemble of recurrent neural networks

In audio communication systems, the perceptual audio quality of the reproduced audio signals such as the naturalness of the sound is limited by the available audio bandwidth. In this paper, a wideband to super-wideband audio bandwidth extension method is proposed using an ensemble of recurrent neural networks. The feature space of wideband audio is firstly divided into different regions through...

متن کامل

Conversational quality estimation model for wideband IP-telephony services

As broadband and high-speed IP networks spread, IP-telephony services have become a popular speech communication application over IP networks. Recently, the speech quality of IP-telephony services has become close to that of conventional PSTN services. To provide better speech quality to users, speech communication with wider bandwidth (e.g., 7 kHz) is one of the most promising applications. To...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008